A major emerging challenge is how to protect people's privacy as cameras andcomputer vision are increasingly integrated into our daily lives, including insmart devices inside homes. A potential solution is to capture and record justthe minimum amount of information needed to perform a task of interest. In thispaper, we propose a fully-coupled two-stream spatiotemporal architecture forreliable human action recognition on extremely low resolution (e.g., 12x16pixel) videos. We provide an efficient method to extract spatial and temporalfeatures and to aggregate them into a robust feature representation for anentire action video sequence. We also consider how to incorporate highresolution videos during training in order to build better low resolutionaction recognition models. We evaluate on two publicly-available datasets,showing significant improvements over the state-of-the-art.
展开▼